{
    "cells": [
        {
            "attachments": {},
            "cell_type": "markdown",
            "id": "a0d0efc8-3e03-4f66-8f6d-a907b6b6d7c1",
            "metadata": {},
            "source": [
                "# Recursive Retriever + Query Engine Demo \n",
                "\n",
                "In this demo, we walk through a use case of showcasing our \"RecursiveRetriever\" module over hierarchical data.\n",
                "\n",
                "The concept of recursive retrieval is that we not only explore the directly most relevant nodes, but also explore\n",
                "node relationships to additional retrievers/query engines and execute them. For instance, a node may represent a concise summary of a structured table,\n",
                "and link to a SQL/Pandas query engine over that structured table. Then if the node is retrieved, we want to also query the underlying query engine for the answer.\n",
                "\n",
                "This can be especially useful for documents with hierarchical relationships. In this example, we walk through a Wikipedia article about billionaires (in PDF form), which contains both text and a variety of embedded structured tables. We first create a Pandas query engine over each table, but also represent each table by an `IndexNode` (stores a link to the query engine); this Node is stored along with other Nodes in a vector store. \n",
                "\n",
                "During query-time, if an `IndexNode` is fetched, then the underlying query engine/retriever will be queried. \n",
                "\n",
                "**Notes about Setup**\n",
                "\n",
                "We use `camelot` to extract text-based tables from PDFs."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 1,
            "id": "199ec60c-ea45-46d3-ba21-66db6e16726f",
            "metadata": {
                "tags": []
            },
            "outputs": [],
            "source": [
                "import camelot\n",
                "from llama_index import Document, SummaryIndex\n",
                "\n",
                "# https://en.wikipedia.org/wiki/The_World%27s_Billionaires\n",
                "from llama_index import VectorStoreIndex, ServiceContext, LLMPredictor\n",
                "from llama_index.query_engine import PandasQueryEngine, RetrieverQueryEngine\n",
                "from llama_index.retrievers import RecursiveRetriever\n",
                "from llama_index.schema import IndexNode\n",
                "from llama_index.llms import OpenAI\n",
                "\n",
                "from llama_hub.file.pymu_pdf.base import PyMuPDFReader\n",
                "from pathlib import Path\n",
                "from typing import List"
            ]
        },
        {
            "attachments": {},
            "cell_type": "markdown",
            "id": "c41f1a9c-939c-4a89-866e-557f43fc330b",
            "metadata": {},
            "source": [
                "## Load in Document (and Tables)\n",
                "\n",
                "We use our `PyMuPDFReader` to read in the main text of the document.\n",
                "\n",
                "We also use `camelot` to extract some structured tables from the document"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 2,
            "id": "4435fcb9-bf67-4d76-9d38-f5cd19086fae",
            "metadata": {
                "tags": []
            },
            "outputs": [],
            "source": [
                "file_path = \"billionaires_page.pdf\""
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 3,
            "id": "e5e78b24-42e0-4d65-980d-13f8c738012f",
            "metadata": {
                "tags": []
            },
            "outputs": [],
            "source": [
                "# initialize PDF reader\n",
                "reader = PyMuPDFReader()"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 4,
            "id": "09976ddc-fe47-4eb6-b577-d5c1c9185974",
            "metadata": {
                "tags": []
            },
            "outputs": [],
            "source": [
                "docs = reader.load(Path(file_path))"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 5,
            "id": "5f086ec9-17e3-40cb-b92d-4e121322ff51",
            "metadata": {
                "tags": []
            },
            "outputs": [],
            "source": [
                "# use camelot to parse tables\n",
                "def get_tables(path: str, pages: List[int]):\n",
                "    table_dfs = []\n",
                "    for page in pages:\n",
                "        table_list = camelot.read_pdf(path, pages=str(page))\n",
                "        table_df = table_list[0].df\n",
                "        table_df = (\n",
                "            table_df.rename(columns=table_df.iloc[0])\n",
                "            .drop(table_df.index[0])\n",
                "            .reset_index(drop=True)\n",
                "        )\n",
                "        table_dfs.append(table_df)\n",
                "    return table_dfs"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 6,
            "id": "f6653cb1-d0f1-4408-94d3-31827e1a3115",
            "metadata": {
                "tags": []
            },
            "outputs": [],
            "source": [
                "table_dfs = get_tables(file_path, pages=[3, 25])"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 7,
            "id": "24e90368-b982-4d05-91b3-0d9dae39eebc",
            "metadata": {
                "tags": []
            },
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>No.</th>\n",
                            "      <th>Name</th>\n",
                            "      <th>Net worth\\n(USD)</th>\n",
                            "      <th>Age</th>\n",
                            "      <th>Nationality</th>\n",
                            "      <th>Primary source(s) of wealth</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>1</td>\n",
                            "      <td>Bernard Arnault &amp;\\nfamily</td>\n",
                            "      <td>$211 billion</td>\n",
                            "      <td>74</td>\n",
                            "      <td>France</td>\n",
                            "      <td>LVMH</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>2</td>\n",
                            "      <td>Elon Musk</td>\n",
                            "      <td>$180 billion</td>\n",
                            "      <td>51</td>\n",
                            "      <td>United\\nStates</td>\n",
                            "      <td>Tesla, SpaceX, X Corp.</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>3</td>\n",
                            "      <td>Jeff Bezos</td>\n",
                            "      <td>$114 billion</td>\n",
                            "      <td>59</td>\n",
                            "      <td>United\\nStates</td>\n",
                            "      <td>Amazon</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>4</td>\n",
                            "      <td>Larry Ellison</td>\n",
                            "      <td>$107 billion</td>\n",
                            "      <td>78</td>\n",
                            "      <td>United\\nStates</td>\n",
                            "      <td>Oracle Corporation</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>5</td>\n",
                            "      <td>Warren Buffett</td>\n",
                            "      <td>$106 billion</td>\n",
                            "      <td>92</td>\n",
                            "      <td>United\\nStates</td>\n",
                            "      <td>Berkshire Hathaway</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>5</th>\n",
                            "      <td>6</td>\n",
                            "      <td>Bill Gates</td>\n",
                            "      <td>$104 billion</td>\n",
                            "      <td>67</td>\n",
                            "      <td>United\\nStates</td>\n",
                            "      <td>Microsoft</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>6</th>\n",
                            "      <td>7</td>\n",
                            "      <td>Michael Bloomberg</td>\n",
                            "      <td>$94.5 billion</td>\n",
                            "      <td>81</td>\n",
                            "      <td>United\\nStates</td>\n",
                            "      <td>Bloomberg L.P.</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>7</th>\n",
                            "      <td>8</td>\n",
                            "      <td>Carlos Slim &amp; family</td>\n",
                            "      <td>$93 billion</td>\n",
                            "      <td>83</td>\n",
                            "      <td>Mexico</td>\n",
                            "      <td>Telmex, América Móvil, Grupo\\nCarso</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>8</th>\n",
                            "      <td>9</td>\n",
                            "      <td>Mukesh Ambani</td>\n",
                            "      <td>$83.4 billion</td>\n",
                            "      <td>65</td>\n",
                            "      <td>India</td>\n",
                            "      <td>Reliance Industries</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>9</th>\n",
                            "      <td>10</td>\n",
                            "      <td>Steve Ballmer</td>\n",
                            "      <td>$80.7 billion</td>\n",
                            "      <td>67</td>\n",
                            "      <td>United\\nStates</td>\n",
                            "      <td>Microsoft</td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "  No.                       Name Net worth\\n(USD) Age     Nationality  \\\n",
                            "0   1  Bernard Arnault &\\nfamily     $211 billion  74          France   \n",
                            "1   2                  Elon Musk     $180 billion  51  United\\nStates   \n",
                            "2   3                 Jeff Bezos     $114 billion  59  United\\nStates   \n",
                            "3   4              Larry Ellison     $107 billion  78  United\\nStates   \n",
                            "4   5             Warren Buffett     $106 billion  92  United\\nStates   \n",
                            "5   6                 Bill Gates     $104 billion  67  United\\nStates   \n",
                            "6   7          Michael Bloomberg    $94.5 billion  81  United\\nStates   \n",
                            "7   8       Carlos Slim & family      $93 billion  83          Mexico   \n",
                            "8   9              Mukesh Ambani    $83.4 billion  65           India   \n",
                            "9  10              Steve Ballmer    $80.7 billion  67  United\\nStates   \n",
                            "\n",
                            "           Primary source(s) of wealth  \n",
                            "0                                 LVMH  \n",
                            "1               Tesla, SpaceX, X Corp.  \n",
                            "2                               Amazon  \n",
                            "3                   Oracle Corporation  \n",
                            "4                   Berkshire Hathaway  \n",
                            "5                            Microsoft  \n",
                            "6                       Bloomberg L.P.  \n",
                            "7  Telmex, América Móvil, Grupo\\nCarso  \n",
                            "8                  Reliance Industries  \n",
                            "9                            Microsoft  "
                        ]
                    },
                    "execution_count": 7,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "# shows list of top billionaires in 2023\n",
                "table_dfs[0]"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 8,
            "id": "0ba98e98-bfb6-4c54-8fa6-b0d7abf381c8",
            "metadata": {
                "tags": []
            },
            "outputs": [
                {
                    "data": {
                        "text/html": [
                            "<div>\n",
                            "<style scoped>\n",
                            "    .dataframe tbody tr th:only-of-type {\n",
                            "        vertical-align: middle;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe tbody tr th {\n",
                            "        vertical-align: top;\n",
                            "    }\n",
                            "\n",
                            "    .dataframe thead th {\n",
                            "        text-align: right;\n",
                            "    }\n",
                            "</style>\n",
                            "<table border=\"1\" class=\"dataframe\">\n",
                            "  <thead>\n",
                            "    <tr style=\"text-align: right;\">\n",
                            "      <th></th>\n",
                            "      <th>Year</th>\n",
                            "      <th>Number of billionaires</th>\n",
                            "      <th>Group's combined net worth</th>\n",
                            "    </tr>\n",
                            "  </thead>\n",
                            "  <tbody>\n",
                            "    <tr>\n",
                            "      <th>0</th>\n",
                            "      <td>2023[2]</td>\n",
                            "      <td>2,640</td>\n",
                            "      <td>$12.2 trillion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>1</th>\n",
                            "      <td>2022[6]</td>\n",
                            "      <td>2,668</td>\n",
                            "      <td>$12.7 trillion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>2</th>\n",
                            "      <td>2021[11]</td>\n",
                            "      <td>2,755</td>\n",
                            "      <td>$13.1 trillion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>3</th>\n",
                            "      <td>2020</td>\n",
                            "      <td>2,095</td>\n",
                            "      <td>$8.0 trillion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>4</th>\n",
                            "      <td>2019</td>\n",
                            "      <td>2,153</td>\n",
                            "      <td>$8.7 trillion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>5</th>\n",
                            "      <td>2018</td>\n",
                            "      <td>2,208</td>\n",
                            "      <td>$9.1 trillion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>6</th>\n",
                            "      <td>2017</td>\n",
                            "      <td>2,043</td>\n",
                            "      <td>$7.7 trillion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>7</th>\n",
                            "      <td>2016</td>\n",
                            "      <td>1,810</td>\n",
                            "      <td>$6.5 trillion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>8</th>\n",
                            "      <td>2015[18]</td>\n",
                            "      <td>1,826</td>\n",
                            "      <td>$7.1 trillion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>9</th>\n",
                            "      <td>2014[67]</td>\n",
                            "      <td>1,645</td>\n",
                            "      <td>$6.4 trillion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>10</th>\n",
                            "      <td>2013[68]</td>\n",
                            "      <td>1,426</td>\n",
                            "      <td>$5.4 trillion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>11</th>\n",
                            "      <td>2012</td>\n",
                            "      <td>1,226</td>\n",
                            "      <td>$4.6 trillion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>12</th>\n",
                            "      <td>2011</td>\n",
                            "      <td>1,210</td>\n",
                            "      <td>$4.5 trillion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>13</th>\n",
                            "      <td>2010</td>\n",
                            "      <td>1,011</td>\n",
                            "      <td>$3.6 trillion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>14</th>\n",
                            "      <td>2009</td>\n",
                            "      <td>793</td>\n",
                            "      <td>$2.4 trillion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>15</th>\n",
                            "      <td>2008</td>\n",
                            "      <td>1,125</td>\n",
                            "      <td>$4.4 trillion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>16</th>\n",
                            "      <td>2007</td>\n",
                            "      <td>946</td>\n",
                            "      <td>$3.5 trillion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>17</th>\n",
                            "      <td>2006</td>\n",
                            "      <td>793</td>\n",
                            "      <td>$2.6 trillion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>18</th>\n",
                            "      <td>2005</td>\n",
                            "      <td>691</td>\n",
                            "      <td>$2.2 trillion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>19</th>\n",
                            "      <td>2004</td>\n",
                            "      <td>587</td>\n",
                            "      <td>$1.9 trillion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>20</th>\n",
                            "      <td>2003</td>\n",
                            "      <td>476</td>\n",
                            "      <td>$1.4 trillion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>21</th>\n",
                            "      <td>2002</td>\n",
                            "      <td>497</td>\n",
                            "      <td>$1.5 trillion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>22</th>\n",
                            "      <td>2001</td>\n",
                            "      <td>538</td>\n",
                            "      <td>$1.8 trillion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>23</th>\n",
                            "      <td>2000</td>\n",
                            "      <td>470</td>\n",
                            "      <td>$898 billion</td>\n",
                            "    </tr>\n",
                            "    <tr>\n",
                            "      <th>24</th>\n",
                            "      <td>Sources: Forbes.[18][67][66][68]</td>\n",
                            "      <td></td>\n",
                            "      <td></td>\n",
                            "    </tr>\n",
                            "  </tbody>\n",
                            "</table>\n",
                            "</div>"
                        ],
                        "text/plain": [
                            "                                Year Number of billionaires  \\\n",
                            "0                            2023[2]                  2,640   \n",
                            "1                            2022[6]                  2,668   \n",
                            "2                           2021[11]                  2,755   \n",
                            "3                               2020                  2,095   \n",
                            "4                               2019                  2,153   \n",
                            "5                               2018                  2,208   \n",
                            "6                               2017                  2,043   \n",
                            "7                               2016                  1,810   \n",
                            "8                           2015[18]                  1,826   \n",
                            "9                           2014[67]                  1,645   \n",
                            "10                          2013[68]                  1,426   \n",
                            "11                              2012                  1,226   \n",
                            "12                              2011                  1,210   \n",
                            "13                              2010                  1,011   \n",
                            "14                              2009                    793   \n",
                            "15                              2008                  1,125   \n",
                            "16                              2007                    946   \n",
                            "17                              2006                    793   \n",
                            "18                              2005                    691   \n",
                            "19                              2004                    587   \n",
                            "20                              2003                    476   \n",
                            "21                              2002                    497   \n",
                            "22                              2001                    538   \n",
                            "23                              2000                    470   \n",
                            "24  Sources: Forbes.[18][67][66][68]                          \n",
                            "\n",
                            "   Group's combined net worth  \n",
                            "0              $12.2 trillion  \n",
                            "1              $12.7 trillion  \n",
                            "2              $13.1 trillion  \n",
                            "3               $8.0 trillion  \n",
                            "4               $8.7 trillion  \n",
                            "5               $9.1 trillion  \n",
                            "6               $7.7 trillion  \n",
                            "7               $6.5 trillion  \n",
                            "8               $7.1 trillion  \n",
                            "9               $6.4 trillion  \n",
                            "10              $5.4 trillion  \n",
                            "11              $4.6 trillion  \n",
                            "12              $4.5 trillion  \n",
                            "13              $3.6 trillion  \n",
                            "14              $2.4 trillion  \n",
                            "15              $4.4 trillion  \n",
                            "16              $3.5 trillion  \n",
                            "17              $2.6 trillion  \n",
                            "18              $2.2 trillion  \n",
                            "19              $1.9 trillion  \n",
                            "20              $1.4 trillion  \n",
                            "21              $1.5 trillion  \n",
                            "22              $1.8 trillion  \n",
                            "23               $898 billion  \n",
                            "24                             "
                        ]
                    },
                    "execution_count": 8,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "# shows list of top billionaires\n",
                "table_dfs[1]"
            ]
        },
        {
            "attachments": {},
            "cell_type": "markdown",
            "id": "02967be4-be85-4046-b4e2-e5dd8e65628a",
            "metadata": {},
            "source": [
                "## Create Pandas Query Engines\n",
                "\n",
                "We create a pandas query engine over each structured table.\n",
                "\n",
                "These can be executed on their own to answer queries about each table."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 9,
            "id": "2f38bbf0-502a-44f7-b33b-443fcd90583a",
            "metadata": {
                "tags": []
            },
            "outputs": [],
            "source": [
                "# define query engines over these tables\n",
                "df_query_engines = [PandasQueryEngine(table_df) for table_df in table_dfs]"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 10,
            "id": "9fbc43a0-5655-4c8a-80d4-71c7fe9d7275",
            "metadata": {
                "tags": []
            },
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "df.iloc[1]['Net worth\\n(USD)']\n"
                    ]
                },
                {
                    "data": {
                        "text/plain": [
                            "Response(response='$180\\xa0billion', source_nodes=[], metadata={'pandas_instruction_str': \"\\ndf.iloc[1]['Net worth\\\\n(USD)']\"})"
                        ]
                    },
                    "execution_count": 10,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df_query_engines[0].query(\n",
                "    \"What's the net worth of the second richest billionaire in 2023?\"\n",
                ")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 41,
            "id": "78dbe39d-eea0-4c7f-bea7-2c0f6f6591cd",
            "metadata": {
                "tags": []
            },
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "df[df['Year'] == '2009']['Number of billionaires'].iloc[0]\n"
                    ]
                },
                {
                    "data": {
                        "text/plain": [
                            "Response(response='793', source_nodes=[], metadata={'pandas_instruction_str': \"\\ndf[df['Year'] == '2009']['Number of billionaires'].iloc[0]\"})"
                        ]
                    },
                    "execution_count": 41,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "df_query_engines[1].query(\"How many billionaires were there in 2009?\")"
            ]
        },
        {
            "attachments": {},
            "cell_type": "markdown",
            "id": "e9bca53c-8766-42e4-96c7-145e7f14be34",
            "metadata": {},
            "source": [
                "## Build Vector Index\n",
                "\n",
                "Build vector index over the chunked document as well as over the additional `IndexNode` objects linked to the tables."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 42,
            "id": "fffd5616-4256-46da-a765-10476a1ee1ee",
            "metadata": {
                "tags": []
            },
            "outputs": [],
            "source": [
                "llm = OpenAI(temperature=0, model=\"gpt-4\")\n",
                "\n",
                "service_context = ServiceContext.from_defaults(\n",
                "    llm=llm,\n",
                ")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 43,
            "id": "6a8b6605-823a-4607-be0e-99c67d5a90ff",
            "metadata": {
                "tags": []
            },
            "outputs": [],
            "source": [
                "doc_nodes = service_context.node_parser.get_nodes_from_documents(docs)"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 61,
            "id": "6a3d249c-2242-4158-88ea-d16b67815107",
            "metadata": {
                "tags": []
            },
            "outputs": [],
            "source": [
                "# define index nodes\n",
                "summaries = [\n",
                "    \"This node provides information about the world's richest billionaires in 2023\",\n",
                "    \"This node provides information on the number of billionaires and their combined net worth from 2000 to 2023.\",\n",
                "]\n",
                "\n",
                "df_nodes = [\n",
                "    IndexNode(text=summary, index_id=f\"pandas{idx}\")\n",
                "    for idx, summary in enumerate(summaries)\n",
                "]\n",
                "\n",
                "df_id_query_engine_mapping = {\n",
                "    f\"pandas{idx}\": df_query_engine\n",
                "    for idx, df_query_engine in enumerate(df_query_engines)\n",
                "}"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 62,
            "id": "abf87458-cd3d-4ac4-b934-20ee8a9a820a",
            "metadata": {
                "tags": []
            },
            "outputs": [],
            "source": [
                "# construct top-level vector index + query engine\n",
                "vector_index = VectorStoreIndex(doc_nodes + df_nodes)\n",
                "vector_retriever = vector_index.as_retriever(similarity_top_k=1)"
            ]
        },
        {
            "attachments": {},
            "cell_type": "markdown",
            "id": "6ed917f6-e407-4ebb-9c15-36aedd207c6f",
            "metadata": {},
            "source": [
                "## Use `RecursiveRetriever` in our `RetrieverQueryEngine`\n",
                "\n",
                "We define a `RecursiveRetriever` object to recursively retrieve/query nodes. We then put this in our `RetrieverQueryEngine` along with a `ResponseSynthesizer` to synthesize a response.\n",
                "\n",
                "We pass in mappings from id to retriever and id to query engine. We then pass in a root id representing the retriever we query first."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 63,
            "id": "9a7d6031-f7a0-45c2-9a84-813b4e3fcf28",
            "metadata": {
                "tags": []
            },
            "outputs": [],
            "source": [
                "# baseline vector index (that doesn't include the extra df nodes).\n",
                "# used to benchmark\n",
                "vector_index0 = VectorStoreIndex(doc_nodes)\n",
                "vector_query_engine0 = vector_index0.as_query_engine()"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 64,
            "id": "e38d1e90-7d83-46bb-99d7-0892aef4d3ca",
            "metadata": {
                "tags": []
            },
            "outputs": [],
            "source": [
                "from llama_index.retrievers import RecursiveRetriever\n",
                "from llama_index.query_engine import RetrieverQueryEngine\n",
                "from llama_index.response_synthesizers import get_response_synthesizer\n",
                "\n",
                "recursive_retriever = RecursiveRetriever(\n",
                "    \"vector\",\n",
                "    retriever_dict={\"vector\": vector_retriever},\n",
                "    query_engine_dict=df_id_query_engine_mapping,\n",
                "    verbose=True,\n",
                ")\n",
                "\n",
                "response_synthesizer = get_response_synthesizer(\n",
                "    # service_context=service_context,\n",
                "    response_mode=\"compact\"\n",
                ")\n",
                "\n",
                "query_engine = RetrieverQueryEngine.from_args(\n",
                "    recursive_retriever, response_synthesizer=response_synthesizer\n",
                ")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 65,
            "id": "6af6823d-bd07-4088-8422-bd9aa3224b08",
            "metadata": {
                "tags": []
            },
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "\u001b[36;1m\u001b[1;3mRetrieving with query id None: What's the net worth of the second richest billionaire in 2023?\n",
                        "\u001b[0m\u001b[38;5;200m\u001b[1;3mRetrieved node with id, entering: pandas0\n",
                        "\u001b[0m\u001b[36;1m\u001b[1;3mRetrieving with query id pandas0: What's the net worth of the second richest billionaire in 2023?\n",
                        "\u001b[0mdf.iloc[1]['Net worth\\n(USD)']\n",
                        "\u001b[32;1m\u001b[1;3mGot response: $180 billion\n",
                        "\u001b[0m"
                    ]
                }
            ],
            "source": [
                "response = query_engine.query(\n",
                "    \"What's the net worth of the second richest billionaire in 2023?\"\n",
                ")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 66,
            "id": "6122ff8e-f84d-47cc-b363-44621e0623ab",
            "metadata": {
                "tags": []
            },
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "\"Query: What's the net worth of the second richest billionaire in 2023?\\nResponse: $180\\xa0billion\""
                        ]
                    },
                    "execution_count": 66,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "response.source_nodes[0].node.get_content()"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 67,
            "id": "4096f30c-c6a3-4f74-b496-1240fdc08fd4",
            "metadata": {
                "tags": []
            },
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "'\\n$180 billion'"
                        ]
                    },
                    "execution_count": 67,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "str(response)"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 68,
            "id": "5cf133a7-2532-4179-af21-d495eb547083",
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "\u001b[36;1m\u001b[1;3mRetrieving with query id None: How many billionaires were there in 2009?\n",
                        "\u001b[0m\u001b[38;5;200m\u001b[1;3mRetrieved node with id, entering: pandas1\n",
                        "\u001b[0m\u001b[36;1m\u001b[1;3mRetrieving with query id pandas1: How many billionaires were there in 2009?\n",
                        "\u001b[0mdf[df['Year'] == '2009']['Number of billionaires'].iloc[0]\n",
                        "\u001b[32;1m\u001b[1;3mGot response: 793\n",
                        "\u001b[0m"
                    ]
                }
            ],
            "source": [
                "response = query_engine.query(\"How many billionaires were there in 2009?\")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 69,
            "id": "489e4c59-a40c-47b8-a788-e80558ce7e3a",
            "metadata": {
                "tags": []
            },
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "'\\n793'"
                        ]
                    },
                    "execution_count": 69,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "str(response)"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 35,
            "id": "551781f3-8761-4385-966d-a0a6b0526ed6",
            "metadata": {
                "tags": []
            },
            "outputs": [],
            "source": [
                "response = vector_query_engine0.query(\n",
                "    \"What's the net worth of the second richest billionaire in 2023?\"\n",
                ")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 36,
            "id": "3656db61-32bc-49c6-9774-8439323358a5",
            "metadata": {
                "tags": []
            },
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "7/1/23, 11:31 PM\n",
                        "The World's Billionaires - Wikipedia\n",
                        "https://en.wikipedia.org/wiki/The_World%27s_Billionaires\n",
                        "3/33\n",
                        "No.\n",
                        "Name\n",
                        "Net worth\n",
                        "(USD)\n",
                        "Age\n",
                        "Nationality\n",
                        "Primary source(s) of wealth\n",
                        "1 \n",
                        "Bernard Arnault &\n",
                        "family\n",
                        "$211 billion \n",
                        "74\n",
                        " France\n",
                        "LVMH\n",
                        "2 \n",
                        "Elon Musk\n",
                        "$180 billion \n",
                        "51\n",
                        " United\n",
                        "States\n",
                        "Tesla, SpaceX, X Corp.\n",
                        "3 \n",
                        "Jeff Bezos\n",
                        "$114 billion \n",
                        "59\n",
                        " United\n",
                        "States\n",
                        "Amazon\n",
                        "4 \n",
                        "Larry Ellison\n",
                        "$107 billion \n",
                        "78\n",
                        " United\n",
                        "States\n",
                        "Oracle Corporation\n",
                        "5 \n",
                        "Warren Buffett\n",
                        "$106 billion \n",
                        "92\n",
                        " United\n",
                        "States\n",
                        "Berkshire Hathaway\n",
                        "6 \n",
                        "Bill Gates\n",
                        "$104 billion \n",
                        "67\n",
                        " United\n",
                        "States\n",
                        "Microsoft\n",
                        "7 \n",
                        "Michael Bloomberg\n",
                        "$94.5 billion \n",
                        "81\n",
                        " United\n",
                        "States\n",
                        "Bloomberg L.P.\n",
                        "8 \n",
                        "Carlos Slim & family\n",
                        "$93 billion \n",
                        "83\n",
                        " Mexico\n",
                        "Telmex, América Móvil, Grupo\n",
                        "Carso\n",
                        "9 \n",
                        "Mukesh Ambani\n",
                        "$83.4 billion \n",
                        "65\n",
                        " India\n",
                        "Reliance Industries\n",
                        "10 \n",
                        "Steve Ballmer\n",
                        "$80.7 billion \n",
                        "67\n",
                        " United\n",
                        "States\n",
                        "Microsoft\n",
                        "In the 36th annual Forbes list of the world's billionaires, the list included 2,668 billionaires with a\n",
                        "total net wealth of $12.7 trillion, down 97 members from 2021.[6]\n",
                        "2022\n"
                    ]
                }
            ],
            "source": [
                "print(response.source_nodes[1].node.get_content())"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 37,
            "id": "89b47692-95f0-4077-9439-d8af237e5c16",
            "metadata": {
                "tags": []
            },
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "\n",
                        "The net worth of the second richest billionaire in 2023 is $211 billion.\n"
                    ]
                }
            ],
            "source": [
                "print(str(response))"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 38,
            "id": "1fbe211d-d59b-4f87-897c-0fa03e42641e",
            "metadata": {
                "tags": []
            },
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "\"7/1/23, 11:31 PM\\nThe World's Billionaires - Wikipedia\\nhttps://en.wikipedia.org/wiki/The_World%27s_Billionaires\\n3/33\\nNo.\\nName\\nNet worth\\n(USD)\\nAge\\nNationality\\nPrimary source(s) of wealth\\n1 \\nBernard Arnault &\\nfamily\\n$211\\xa0billion\\xa0\\n74\\n\\xa0France\\nLVMH\\n2 \\nElon Musk\\n$180\\xa0billion\\xa0\\n51\\n\\xa0United\\nStates\\nTesla, SpaceX, X Corp.\\n3 \\nJeff Bezos\\n$114\\xa0billion\\xa0\\n59\\n\\xa0United\\nStates\\nAmazon\\n4 \\nLarry Ellison\\n$107\\xa0billion\\xa0\\n78\\n\\xa0United\\nStates\\nOracle Corporation\\n5 \\nWarren Buffett\\n$106\\xa0billion\\xa0\\n92\\n\\xa0United\\nStates\\nBerkshire Hathaway\\n6 \\nBill Gates\\n$104\\xa0billion\\xa0\\n67\\n\\xa0United\\nStates\\nMicrosoft\\n7 \\nMichael Bloomberg\\n$94.5\\xa0billion\\xa0\\n81\\n\\xa0United\\nStates\\nBloomberg L.P.\\n8 \\nCarlos Slim & family\\n$93\\xa0billion\\xa0\\n83\\n\\xa0Mexico\\nTelmex, América Móvil, Grupo\\nCarso\\n9 \\nMukesh Ambani\\n$83.4\\xa0billion \\n65\\n\\xa0India\\nReliance Industries\\n10 \\nSteve Ballmer\\n$80.7\\xa0billion\\xa0\\n67\\n\\xa0United\\nStates\\nMicrosoft\\nIn the 36th annual Forbes list of the world's billionaires, the list included 2,668 billionaires with a\\ntotal net wealth of $12.7 trillion, down 97 members from 2021.[6]\\n2022\""
                        ]
                    },
                    "execution_count": 38,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "response.source_nodes[1].node.get_content()"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "960c81f6-b7a6-43ac-aa4b-9ef20a5400b1",
            "metadata": {},
            "outputs": [],
            "source": []
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "llama_index_v2",
            "language": "python",
            "name": "llama_index_v2"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.10.10"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 5
}